Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents
نویسندگان
چکیده
This paper describes a method to acquire hyponyms for given hypernyms from HTML documents on the WWW. We assume that a heading (or explanation) of an itemization (or listing) in an HTML document is likely to contain a hypernym of the items in the itemization, and we try to acquire hyponymy relations based on this assumption. Our method is obtained by extending Shinzato’s method (Shinzato and Torisawa, 2004) where a common hypernym for expressions in itemizations in HTML documents is obtained by using statistical measures. By using Japanese HTML documents, we empirically show that our proposed method can obtain a significant number of hyponymy relations which would otherwise be missed by alternative methods.
منابع مشابه
Using Relational Adjectives for Extracting Hyponyms from Medical Texts
We expose a method for extracting hyponyms and hypernyms from analytical definitions, focusing on the relation observed between hypernyms and relational adjectives (e.g., cardiovascular disease). These adjectives introduce a set of specialized features according to a categorization proper to a particular knowledge domain. For detecting these sequences of hypernyms associated to relational adjec...
متن کاملAcquisition of Hypernyms and Hyponyms from the WWW
Recently research in automatic ontology construction has become a hot topic, because of the vision that ontology will be the core component to realize the semantic web. This paper presents a method to automatically construct ontology by mining the web. We introduce an algorithm to automatically acquire hypernyms and hyponyms for any given lexical term using search engine and natural language pr...
متن کاملNine Features in a Random Forest to Learn Taxonomical Semantic Relations
ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016). It relies on a Random Forest algorithm and nine unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech...
متن کاملUsing Lexical Patterns for Extracting Hyponyms from the Web
This paper describes a method for extracting hyponyms from free text. In particular it explores two main matters. On the one hand, the possibility of reaching favorable results using only lexical extraction patterns. On the other hand, the usefulness of measuring the instance’s confidences based on the pattern’s confidences, and vice versa. Experimental results are encouraging because they show...
متن کاملA Phrase-based Ontology Enabled Semantic Processing System for Web Search
Semantic processing system (SPS) is a system that performs phrase search of web content. SPS takes a user query in natural language, converts it to a keyword query, expands the keyword query with synonyms, hypernyms, hyponyms, and meronyms, and presents the keyword query to a search engine. SPS then sifts through the search engine result pages extracting grammatical and semantic information fro...
متن کامل